50 research outputs found

    A corpus-based semantic kernel for text classification by using meaning values of terms

    Get PDF
    Text categorization plays a crucial role in both academic and commercial platforms due to the growing demand for automatic organization of documents. Kernel-based classification algorithms such as Support Vector Machines (SVM) have become highly popular in the task of text mining. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. Recently, there is an increased interest in the exploitation of background knowledge such as ontologies and corpus-based statistical knowledge in text categorization. It has been shown that, by replacing the standard kernel functions such as linear kernel with customized kernel functions which take advantage of this background knowledge, it is possible to increase the performance of SVM in the text classification domain. Based on this, we propose a novel semantic smoothing kernel for SVM. The suggested approach is based on a meaning measure, which calculates the meaningfulness of the terms in the context of classes. The documents vectors are smoothed based on these meaning values of the terms in the context of classes. Since we efficiently make use of the class information in the smoothing process, it can be considered a supervised smoothing kernel. The meaning measure is based on the Helmholtz principle from Gestalt theory and has previously been applied to several text mining applications such as document summarization and feature extraction. However, to the best of our knowledge, ours is the first study to use meaning measure in a supervised setting to build a semantic kernel for SVM. We evaluated the proposed approach by conducting a large number of experiments on well-known textual datasets and present results with respect to different experimental conditions. We compare our results with traditional kernels used in SVM such as linear kernel as well as with several corpus-based semantic kernels. Our results show that classification performance of the proposed approach outperforms other kernels

    A fault detection strategy for software projects

    Get PDF
    Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

    A fault detection strategy for software projects

    Get PDF
    Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

    MORPHOLOGY BASED TEXT COMPRESSION

    Get PDF
    İnternet‟in yaygınlaşmasıyla sayısal ortamdaki doküman sayısı gittikçe artmakta ve bu bilgiye daha kolay ve hızlı bir şekilde erişme isteği doküman sıkıştırmayı önemli hale getirmektedir. Doküman sıkıştırma alanında yapılan çalışmaların bir kısmı, dilin biçim bilimsel yapısını kullanmayı amaçlayan çalışmalardır. Bu çalışmada, Türkçe ve İngilizce dokümanların sıkıştırılma verimlerinin belirlenmesinde dilin biçim bilimsel yapısı kullanılarak 10 farklı ayrıştırma yöntemi uygulanmış ve bu yöntemlerin sıkıştırma başarısına olan etkileri karşılaştırmalı olarak verilmiştir. With the rapid growth of online information, the number of documents in digital media is very common increased and access request to this information easier and quickly makes important the document compression. A part of studies on the document compression, the morphological structure of the language used is intended to work. In this study, Turkish and English language documents to determine the compression efficiency by using the morphological structure of 10 different decomposition methods applied and the effect on the compression success of this method are given in comparison

    A Probabilistic Multi-Objective Artificial Bee Colony Algorithm for Gene Selection

    Get PDF
    Microarray technology is widely used to report gene expression data. The inclusion of many features and few samples is one of the characteristic features of this platform. In order to define significant genes for a particular disease, the problem of high-dimensionality microarray data should be overcome. The Artificial Bee Colony (ABC) Algorithm is a successful meta-heuristic algorithm that solves optimization problems effectively. In this paper, we propose a hybrid gene selection method for discriminatively selecting genes. We propose a new probabilistic binary Artificial Bee Colony Algorithm, namely PrBABC, that is hybridized with three different filter methods. The proposed method is applied to nine microarray datasets in order to detect distinctive genes for classifying cancer data. Results are compared with other wellknown meta-heuristic algorithms: Binary Differential Evolution Algorithm (BinDE), Binary Particle Swarm Optimization Algorithm (BinPSO), and Genetic Algorithm (GA), as well as with other methods in the literature. Experimental results show that the probabilistic self-adaptive learning strategy integrated into the employed-bee phase can boost classification accuracy with a minimal number of genes

    Behcet's disease and renal failure

    Get PDF
    Background. The aims of this study were (i) to investigate the prevalence of Behcet's disease (BD) among dialysis patients in Turkey, (ii) to report the clinical characteristics of patients with BD and endstage renal disease (ESRD), (iii) to evaluate the effect of ESRD on course and activity of BD and (iv) to analyse the published data about BD and renal failure. Methods. A questionnaire investigating BD among dialysis patients was submitted to 350 dialysis centres and we obtained the data for 20 596 patients from 331 dialysis centres. We submitted a second questionnaire regarding clinical characteristics of the patients with BD and ESRD. The PubMed and Web of Science databases were used for the analysis of BD and renal failure. Results. Fourteen patients with BD were determined and the prevalence of BD was 0.07% among 20 596 dialysis patients in Turkey. None of the patients has had a new manifestation of BD after initiation of haemodialysis treatment. The analysis of previous data about renal BD demonstrated 67 patients with renal failure. Conclusions. The most common cause of renal failure in BD is amyloidosis. Routine urine analysis and measurement of serum creatinine and blood urea nitrogen levels are needed for early diagnosis. Vascular access-related problems are common and the activity of BD appears to decrease in patients with ESRD after initiation of haemodialysis
    corecore